Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 426 | 432 |
| Missing cells (%) | 8.0% | 8.1% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High Correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High Correlation |
Age has 84 (18.8%) missing values | Age has 88 (19.7%) missing values | Missing |
Cabin has 341 (76.5%) missing values | Cabin has 342 (76.7%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 294 (65.9%) zeros | SibSp has 306 (68.6%) zeros | Zeros |
Parch has 334 (74.9%) zeros | Parch has 348 (78.0%) zeros | Zeros |
Fare has 6 (1.3%) zeros | Fare has 9 (2.0%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2023-11-22 10:41:21.993689 | 2023-11-22 10:41:26.822515 |
| Analysis finished | 2023-11-22 10:41:26.821441 | 2023-11-22 10:41:30.213952 |
| Duration | 4.83 seconds | 3.39 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 457.05605 | 455.97085 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 891 | 890 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 40.5 | 48.75 |
| Q1 | 237 | 230.25 |
| median | 461.5 | 458 |
| Q3 | 686.75 | 681.75 |
| 95-th percentile | 846.75 | 855.75 |
| Maximum | 891 | 890 |
| Range | 890 | 889 |
| Interquartile range (IQR) | 449.75 | 451.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 259.63879 | 260.86626 |
| Coefficient of variation (CV) | 0.56806771 | 0.5721117 |
| Kurtosis | -1.1973026 | -1.23997 |
| Mean | 457.05605 | 455.97085 |
| Median Absolute Deviation (MAD) | 225.5 | 226.5 |
| Skewness | -0.092729601 | -0.030955301 |
| Sum | 203847 | 203363 |
| Variance | 67412.3 | 68051.206 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 191 | 1 | 0.2% |
| 527 | 1 | 0.2% |
| 83 | 1 | 0.2% |
| 256 | 1 | 0.2% |
| 167 | 1 | 0.2% |
| 428 | 1 | 0.2% |
| 687 | 1 | 0.2% |
| 610 | 1 | 0.2% |
| 28 | 1 | 0.2% |
| 327 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 106 | 1 | 0.2% |
| 708 | 1 | 0.2% |
| 689 | 1 | 0.2% |
| 409 | 1 | 0.2% |
| 171 | 1 | 0.2% |
| 450 | 1 | 0.2% |
| 269 | 1 | 0.2% |
| 498 | 1 | 0.2% |
| 878 | 1 | 0.2% |
| 545 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 14 | 1 | |
| 17 | 1 | |
| 20 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 14 | 1 | |
| 17 | 1 | |
| 20 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 0 |
| 2nd row | 0 | 1 |
| 3rd row | 1 | 0 |
| 4th row | 0 | 1 |
| 5th row | 0 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 287 | |
| 1 | 159 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 287 | |
| 1 | 159 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 287 | |
| 1 | 159 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 287 | |
| 1 | 159 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 287 | |
| 1 | 159 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 287 | |
| 1 | 159 |
| Value | Count | Frequency (%) |
| 0 | 275 | |
| 1 | 171 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 2 | 3 |
| 2nd row | 3 | 1 |
| 3rd row | 3 | 2 |
| 4th row | 2 | 3 |
| 5th row | 3 | 2 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 108 | |
| 2 | 84 | 18.8% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 111 | |
| 2 | 90 | 20.2% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 108 | |
| 2 | 84 | 18.8% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 111 | |
| 2 | 90 | 20.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 108 | |
| 2 | 84 | 18.8% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 111 | |
| 2 | 90 | 20.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 108 | |
| 2 | 84 | 18.8% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 111 | |
| 2 | 90 | 20.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 108 | |
| 2 | 84 | 18.8% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 111 | |
| 2 | 90 | 20.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 108 | |
| 2 | 84 | 18.8% |
| Value | Count | Frequency (%) |
| 3 | 245 | |
| 1 | 111 | |
| 2 | 90 | 20.2% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 82 |
| Median length | 50 | 50 |
| Mean length | 26.865471 | 26.710762 |
| Min length | 12 | 12 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 11982 | 11913 |
| Distinct characters | 59 | 60 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Pinsky, Mrs. (Rosa) | Mionoff, Mr. Stoytcho |
| 2nd row | Jensen, Mr. Svend Lauritz | Bradley, Mr. George ("George Arthur Brayton") |
| 3rd row | Baclini, Miss. Marie Catherine | Gill, Mr. John William |
| 4th row | Chapman, Mr. Charles Henry | de Mulder, Mr. Theodore |
| 5th row | Naidenoff, Mr. Penko | Mitchell, Mr. Henry Michael |
| Value | Count | Frequency (%) |
| mr | 269 | 14.8% |
| miss | 82 | 4.5% |
| mrs | 70 | 3.8% |
| john | 27 | 1.5% |
| william | 24 | 1.3% |
| henry | 18 | 1.0% |
| master | 16 | 0.9% |
| thomas | 12 | 0.7% |
| edward | 12 | 0.7% |
| elizabeth | 11 | 0.6% |
| Other values (896) | 1280 |
| Value | Count | Frequency (%) |
| mr | 275 | 15.2% |
| miss | 76 | 4.2% |
| mrs | 67 | 3.7% |
| william | 30 | 1.7% |
| john | 21 | 1.2% |
| henry | 18 | 1.0% |
| master | 17 | 0.9% |
| edward | 14 | 0.8% |
| thomas | 12 | 0.7% |
| james | 11 | 0.6% |
| Other values (907) | 1269 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1376 | 11.5% | |
| r | 951 | 7.9% |
| a | 833 | 7.0% |
| e | 829 | 6.9% |
| n | 673 | 5.6% |
| s | 637 | 5.3% |
| i | 623 | 5.2% |
| M | 565 | 4.7% |
| l | 522 | 4.4% |
| o | 521 | 4.3% |
| Other values (49) | 4452 |
| Value | Count | Frequency (%) |
| 1365 | 11.5% | |
| r | 996 | 8.4% |
| e | 850 | 7.1% |
| a | 818 | 6.9% |
| n | 646 | 5.4% |
| i | 644 | 5.4% |
| s | 629 | 5.3% |
| M | 542 | 4.5% |
| l | 536 | 4.5% |
| o | 483 | 4.1% |
| Other values (50) | 4404 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7676 | |
| Uppercase Letter | 1828 | 15.3% |
| Space Separator | 1376 | 11.5% |
| Other Punctuation | 939 | 7.8% |
| Close Punctuation | 78 | 0.7% |
| Open Punctuation | 78 | 0.7% |
| Dash Punctuation | 7 | 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7620 | |
| Uppercase Letter | 1818 | 15.3% |
| Space Separator | 1365 | 11.5% |
| Other Punctuation | 950 | 8.0% |
| Close Punctuation | 77 | 0.6% |
| Open Punctuation | 77 | 0.6% |
| Dash Punctuation | 6 | 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1376 |
| Value | Count | Frequency (%) |
| 1365 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 951 | |
| a | 833 | |
| e | 829 | |
| n | 673 | |
| s | 637 | |
| i | 623 | |
| l | 522 | 6.8% |
| o | 521 | 6.8% |
| t | 344 | 4.5% |
| h | 260 | 3.4% |
| Other values (16) | 1483 |
| Value | Count | Frequency (%) |
| r | 996 | |
| e | 850 | |
| a | 818 | |
| n | 646 | |
| i | 644 | |
| s | 629 | |
| l | 536 | 7.0% |
| o | 483 | 6.3% |
| t | 304 | 4.0% |
| d | 265 | 3.5% |
| Other values (16) | 1449 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 565 | |
| A | 127 | 6.9% |
| J | 116 | 6.3% |
| H | 92 | 5.0% |
| S | 90 | 4.9% |
| E | 83 | 4.5% |
| B | 82 | 4.5% |
| C | 71 | 3.9% |
| W | 67 | 3.7% |
| L | 62 | 3.4% |
| Other values (15) | 473 |
| Value | Count | Frequency (%) |
| M | 542 | |
| A | 129 | 7.1% |
| J | 111 | 6.1% |
| H | 95 | 5.2% |
| C | 92 | 5.1% |
| E | 89 | 4.9% |
| S | 79 | 4.3% |
| L | 78 | 4.3% |
| B | 72 | 4.0% |
| W | 65 | 3.6% |
| Other values (15) | 466 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 42 | 4.5% |
| ' | 4 | 0.4% |
| Value | Count | Frequency (%) |
| . | 446 | |
| , | 446 | |
| " | 54 | 5.7% |
| ' | 3 | 0.3% |
| / | 1 | 0.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 78 |
| Value | Count | Frequency (%) |
| ) | 77 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 78 |
| Value | Count | Frequency (%) |
| ( | 77 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 7 |
| Value | Count | Frequency (%) |
| - | 6 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9504 | |
| Common | 2478 | 20.7% |
| Value | Count | Frequency (%) |
| Latin | 9438 | |
| Common | 2475 | 20.8% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1376 | ||
| . | 447 | 18.0% |
| , | 446 | 18.0% |
| ) | 78 | 3.1% |
| ( | 78 | 3.1% |
| " | 42 | 1.7% |
| - | 7 | 0.3% |
| ' | 4 | 0.2% |
| Value | Count | Frequency (%) |
| 1365 | ||
| . | 446 | 18.0% |
| , | 446 | 18.0% |
| ) | 77 | 3.1% |
| ( | 77 | 3.1% |
| " | 54 | 2.2% |
| - | 6 | 0.2% |
| ' | 3 | 0.1% |
| / | 1 | < 0.1% |
Latin
| Value | Count | Frequency (%) |
| r | 951 | 10.0% |
| a | 833 | 8.8% |
| e | 829 | 8.7% |
| n | 673 | 7.1% |
| s | 637 | 6.7% |
| i | 623 | 6.6% |
| M | 565 | 5.9% |
| l | 522 | 5.5% |
| o | 521 | 5.5% |
| t | 344 | 3.6% |
| Other values (41) | 3006 |
| Value | Count | Frequency (%) |
| r | 996 | 10.6% |
| e | 850 | 9.0% |
| a | 818 | 8.7% |
| n | 646 | 6.8% |
| i | 644 | 6.8% |
| s | 629 | 6.7% |
| M | 542 | 5.7% |
| l | 536 | 5.7% |
| o | 483 | 5.1% |
| t | 304 | 3.2% |
| Other values (41) | 2990 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11982 |
| Value | Count | Frequency (%) |
| ASCII | 11913 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1376 | 11.5% | |
| r | 951 | 7.9% |
| a | 833 | 7.0% |
| e | 829 | 6.9% |
| n | 673 | 5.6% |
| s | 637 | 5.3% |
| i | 623 | 5.2% |
| M | 565 | 4.7% |
| l | 522 | 4.4% |
| o | 521 | 4.3% |
| Other values (49) | 4452 |
| Value | Count | Frequency (%) |
| 1365 | 11.5% | |
| r | 996 | 8.4% |
| e | 850 | 7.1% |
| a | 818 | 6.9% |
| n | 646 | 5.4% |
| i | 644 | 5.4% |
| s | 629 | 5.3% |
| M | 542 | 4.5% |
| l | 536 | 4.5% |
| o | 483 | 4.1% |
| Other values (50) | 4404 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.6816143 | 4.6457399 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2088 | 2072 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | female | male |
| 2nd row | male | male |
| 3rd row | female | male |
| 4th row | male | male |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 294 | |
| female | 152 |
| Value | Count | Frequency (%) |
| male | 302 | |
| female | 144 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 294 | |
| female | 152 |
| Value | Count | Frequency (%) |
| male | 302 | |
| female | 144 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
| Value | Count | Frequency (%) |
| e | 590 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 144 | 6.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2088 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2072 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
| Value | Count | Frequency (%) |
| e | 590 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 144 | 6.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2088 |
| Value | Count | Frequency (%) |
| Latin | 2072 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
| Value | Count | Frequency (%) |
| e | 590 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 144 | 6.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2088 |
| Value | Count | Frequency (%) |
| ASCII | 2072 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
| Value | Count | Frequency (%) |
| e | 590 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 144 | 6.9% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 76 | 79 |
| Distinct (%) | 21.0% | 22.1% |
| Missing | 84 | 88 |
| Missing (%) | 18.8% | 19.7% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 30.072072 | 30.973715 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| Maximum | 80 | 71 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| 5-th percentile | 6.05 | 4.85 |
| Q1 | 20 | 21 |
| median | 29 | 30 |
| Q3 | 38 | 40 |
| 95-th percentile | 55.95 | 59.15 |
| Maximum | 80 | 71 |
| Range | 79.58 | 70.58 |
| Interquartile range (IQR) | 18 | 19 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.350124 | 14.604977 |
| Coefficient of variation (CV) | 0.47719106 | 0.47152809 |
| Kurtosis | 0.21040517 | 0.057642725 |
| Mean | 30.072072 | 30.973715 |
| Median Absolute Deviation (MAD) | 9 | 9 |
| Skewness | 0.42612785 | 0.33707854 |
| Sum | 10886.09 | 11088.59 |
| Variance | 205.92605 | 213.30534 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 18 | 17 | 3.8% |
| 19 | 15 | 3.4% |
| 24 | 14 | 3.1% |
| 22 | 13 | 2.9% |
| 25 | 12 | 2.7% |
| 21 | 11 | 2.5% |
| 31 | 11 | 2.5% |
| 27 | 11 | 2.5% |
| 29 | 11 | 2.5% |
| 33 | 10 | 2.2% |
| Other values (66) | 237 | |
| (Missing) | 84 | 18.8% |
| Value | Count | Frequency (%) |
| 36 | 15 | 3.4% |
| 18 | 14 | 3.1% |
| 24 | 13 | 2.9% |
| 21 | 12 | 2.7% |
| 30 | 12 | 2.7% |
| 27 | 12 | 2.7% |
| 22 | 12 | 2.7% |
| 19 | 11 | 2.5% |
| 31 | 11 | 2.5% |
| 25 | 10 | 2.2% |
| Other values (69) | 236 | |
| (Missing) | 88 | 19.7% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 1 | 0.2% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| 6 | 2 | 0.4% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 4 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| 7 | 3 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 4 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| 7 | 3 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 1 | 0.2% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| 6 | 2 | 0.4% |
| 7 | 1 | 0.2% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.57623318 | 0.48878924 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 294 | 306 |
| Zeros (%) | 65.9% | 68.6% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 3 | 2 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.1346514 | 0.99825009 |
| Coefficient of variation (CV) | 1.9690838 | 2.0422915 |
| Kurtosis | 14.119952 | 17.217133 |
| Mean | 0.57623318 | 0.48878924 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.277949 | 3.5165073 |
| Sum | 257 | 218 |
| Variance | 1.2874339 | 0.99650325 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 294 | |
| 1 | 107 | 24.0% |
| 2 | 18 | 4.0% |
| 4 | 10 | 2.2% |
| 3 | 10 | 2.2% |
| 5 | 4 | 0.9% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 306 | |
| 1 | 106 | 23.8% |
| 2 | 13 | 2.9% |
| 4 | 11 | 2.5% |
| 3 | 7 | 1.6% |
| 8 | 2 | 0.4% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 294 | |
| 1 | 107 | 24.0% |
| 2 | 18 | 4.0% |
| 3 | 10 | 2.2% |
| 4 | 10 | 2.2% |
| 5 | 4 | 0.9% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 306 | |
| 1 | 106 | 23.8% |
| 2 | 13 | 2.9% |
| 3 | 7 | 1.6% |
| 4 | 11 | 2.5% |
| 5 | 1 | 0.2% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 306 | |
| 1 | 106 | 23.8% |
| 2 | 13 | 2.9% |
| 3 | 7 | 1.6% |
| 4 | 11 | 2.5% |
| 5 | 1 | 0.2% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 294 | |
| 1 | 107 | 24.0% |
| 2 | 18 | 4.0% |
| 3 | 10 | 2.2% |
| 4 | 10 | 2.2% |
| 5 | 4 | 0.9% |
| 8 | 3 | 0.7% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 6 |
| Distinct (%) | 1.3% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.37892377 | 0.35426009 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 5 |
| Zeros | 334 | 348 |
| Zeros (%) | 74.9% | 78.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0.75 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 5 |
| Range | 5 | 5 |
| Interquartile range (IQR) | 0.75 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.75989633 | 0.79609568 |
| Coefficient of variation (CV) | 2.0054069 | 2.2472068 |
| Kurtosis | 7.5813073 | 10.202235 |
| Mean | 0.37892377 | 0.35426009 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.4519294 | 2.9011138 |
| Sum | 169 | 158 |
| Variance | 0.57744243 | 0.63376833 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 334 | |
| 1 | 67 | 15.0% |
| 2 | 38 | 8.5% |
| 3 | 4 | 0.9% |
| 5 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 348 | |
| 1 | 56 | 12.6% |
| 2 | 33 | 7.4% |
| 3 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 334 | |
| 1 | 67 | 15.0% |
| 2 | 38 | 8.5% |
| 3 | 4 | 0.9% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 348 | |
| 1 | 56 | 12.6% |
| 2 | 33 | 7.4% |
| 3 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 348 | |
| 1 | 56 | 12.6% |
| 2 | 33 | 7.4% |
| 3 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 334 | |
| 1 | 67 | 15.0% |
| 2 | 38 | 8.5% |
| 3 | 4 | 0.9% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 382 | 385 |
| Distinct (%) | 85.7% | 86.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.7174888 | 6.7600897 |
| Min length | 4 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2996 | 3015 |
| Distinct characters | 35 | 32 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 331 | 336 ? |
| Unique (%) | 74.2% | 75.3% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 234604 | 349207 |
| 2nd row | 350048 | 111427 |
| 3rd row | 2666 | 233866 |
| 4th row | 248731 | 345774 |
| 5th row | 349206 | C.A. 24580 |
| Value | Count | Frequency (%) |
| pc | 27 | 4.8% |
| c.a | 16 | 2.8% |
| a/5 | 10 | 1.8% |
| ca | 7 | 1.2% |
| 2 | 6 | 1.1% |
| ston/o | 6 | 1.1% |
| soton/o.q | 4 | 0.7% |
| f.c.c | 4 | 0.7% |
| w./c | 4 | 0.7% |
| a/4 | 4 | 0.7% |
| Other values (399) | 475 |
| Value | Count | Frequency (%) |
| pc | 30 | 5.3% |
| c.a | 17 | 3.0% |
| a/5 | 9 | 1.6% |
| sc/paris | 7 | 1.2% |
| 347082 | 6 | 1.1% |
| 2 | 6 | 1.1% |
| ston/o | 6 | 1.1% |
| soton/oq | 5 | 0.9% |
| 1601 | 4 | 0.7% |
| 17474 | 3 | 0.5% |
| Other values (406) | 469 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 365 | |
| 1 | 342 | |
| 2 | 312 | |
| 7 | 243 | |
| 4 | 239 | |
| 6 | 201 | 6.7% |
| 5 | 194 | 6.5% |
| 0 | 191 | 6.4% |
| 9 | 185 | 6.2% |
| 8 | 139 | 4.6% |
| Other values (25) | 585 |
| Value | Count | Frequency (%) |
| 1 | 373 | |
| 3 | 372 | |
| 2 | 296 | |
| 7 | 245 | |
| 4 | 217 | 7.2% |
| 6 | 203 | 6.7% |
| 5 | 202 | 6.7% |
| 0 | 195 | 6.5% |
| 9 | 161 | 5.3% |
| 8 | 142 | 4.7% |
| Other values (22) | 609 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2411 | |
| Uppercase Letter | 314 | 10.5% |
| Other Punctuation | 145 | 4.8% |
| Space Separator | 117 | 3.9% |
| Lowercase Letter | 9 | 0.3% |
| Value | Count | Frequency (%) |
| Decimal Number | 2406 | |
| Uppercase Letter | 338 | 11.2% |
| Other Punctuation | 142 | 4.7% |
| Space Separator | 116 | 3.8% |
| Lowercase Letter | 13 | 0.4% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 365 | |
| 1 | 342 | |
| 2 | 312 | |
| 7 | 243 | |
| 4 | 239 | |
| 6 | 201 | |
| 5 | 194 | |
| 0 | 191 | |
| 9 | 185 | |
| 8 | 139 | 5.8% |
| Value | Count | Frequency (%) |
| 1 | 373 | |
| 3 | 372 | |
| 2 | 296 | |
| 7 | 245 | |
| 4 | 217 | |
| 6 | 203 | |
| 5 | 202 | |
| 0 | 195 | |
| 9 | 161 | |
| 8 | 142 | 5.9% |
Space Separator
| Value | Count | Frequency (%) |
| 117 |
| Value | Count | Frequency (%) |
| 116 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 97 | |
| / | 48 |
| Value | Count | Frequency (%) |
| . | 91 | |
| / | 51 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 75 | |
| O | 49 | |
| A | 44 | |
| P | 43 | |
| S | 34 | |
| N | 18 | 5.7% |
| T | 17 | 5.4% |
| Q | 8 | 2.5% |
| W | 7 | 2.2% |
| F | 6 | 1.9% |
| Other values (6) | 13 | 4.1% |
| Value | Count | Frequency (%) |
| C | 73 | |
| O | 53 | |
| P | 50 | |
| A | 41 | |
| S | 41 | |
| N | 22 | 6.5% |
| T | 19 | 5.6% |
| Q | 8 | 2.4% |
| I | 8 | 2.4% |
| W | 6 | 1.8% |
| Other values (5) | 17 | 5.0% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 3 | |
| s | 2 | |
| r | 1 | 11.1% |
| i | 1 | 11.1% |
| l | 1 | 11.1% |
| e | 1 | 11.1% |
| Value | Count | Frequency (%) |
| a | 4 | |
| r | 3 | |
| i | 3 | |
| s | 3 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2673 | |
| Latin | 323 | 10.8% |
| Value | Count | Frequency (%) |
| Common | 2664 | |
| Latin | 351 | 11.6% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 365 | |
| 1 | 342 | |
| 2 | 312 | |
| 7 | 243 | |
| 4 | 239 | |
| 6 | 201 | |
| 5 | 194 | |
| 0 | 191 | |
| 9 | 185 | |
| 8 | 139 | 5.2% |
| Other values (3) | 262 |
| Value | Count | Frequency (%) |
| 1 | 373 | |
| 3 | 372 | |
| 2 | 296 | |
| 7 | 245 | |
| 4 | 217 | |
| 6 | 203 | |
| 5 | 202 | |
| 0 | 195 | |
| 9 | 161 | |
| 8 | 142 | 5.3% |
| Other values (3) | 258 |
Latin
| Value | Count | Frequency (%) |
| C | 75 | |
| O | 49 | |
| A | 44 | |
| P | 43 | |
| S | 34 | |
| N | 18 | 5.6% |
| T | 17 | 5.3% |
| Q | 8 | 2.5% |
| W | 7 | 2.2% |
| F | 6 | 1.9% |
| Other values (12) | 22 | 6.8% |
| Value | Count | Frequency (%) |
| C | 73 | |
| O | 53 | |
| P | 50 | |
| A | 41 | |
| S | 41 | |
| N | 22 | 6.3% |
| T | 19 | 5.4% |
| Q | 8 | 2.3% |
| I | 8 | 2.3% |
| W | 6 | 1.7% |
| Other values (9) | 30 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2996 |
| Value | Count | Frequency (%) |
| ASCII | 3015 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 365 | |
| 1 | 342 | |
| 2 | 312 | |
| 7 | 243 | |
| 4 | 239 | |
| 6 | 201 | 6.7% |
| 5 | 194 | 6.5% |
| 0 | 191 | 6.4% |
| 9 | 185 | 6.2% |
| 8 | 139 | 4.6% |
| Other values (25) | 585 |
| Value | Count | Frequency (%) |
| 1 | 373 | |
| 3 | 372 | |
| 2 | 296 | |
| 7 | 245 | |
| 4 | 217 | 7.2% |
| 6 | 203 | 6.7% |
| 5 | 202 | 6.7% |
| 0 | 195 | 6.5% |
| 9 | 161 | 5.3% |
| 8 | 142 | 4.7% |
| Other values (22) | 609 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 181 | 179 |
| Distinct (%) | 40.6% | 40.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 33.269721 | 30.389648 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 6 | 9 |
| Zeros (%) | 1.3% | 2.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.129175 | 7.0542 |
| Q1 | 7.8958 | 7.8958 |
| median | 14.45625 | 13.93125 |
| Q3 | 31 | 30.5 |
| 95-th percentile | 112.67708 | 106.425 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 23.1042 | 22.6042 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 53.559629 | 45.117134 |
| Coefficient of variation (CV) | 1.6098611 | 1.4846218 |
| Kurtosis | 32.276095 | 35.222513 |
| Mean | 33.269721 | 30.389648 |
| Median Absolute Deviation (MAD) | 6.93335 | 6.63125 |
| Skewness | 4.8214596 | 4.7708925 |
| Sum | 14838.295 | 13553.783 |
| Variance | 2868.6338 | 2035.5558 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 7.8958 | 27 | 6.1% |
| 8.05 | 18 | 4.0% |
| 13 | 16 | 3.6% |
| 7.75 | 14 | 3.1% |
| 26 | 14 | 3.1% |
| 10.5 | 11 | 2.5% |
| 8.6625 | 9 | 2.0% |
| 7.925 | 9 | 2.0% |
| 7.2292 | 9 | 2.0% |
| 7.8542 | 7 | 1.6% |
| Other values (171) | 312 |
| Value | Count | Frequency (%) |
| 13 | 22 | 4.9% |
| 8.05 | 21 | 4.7% |
| 7.8958 | 20 | 4.5% |
| 10.5 | 14 | 3.1% |
| 26 | 13 | 2.9% |
| 7.75 | 12 | 2.7% |
| 7.775 | 10 | 2.2% |
| 7.925 | 10 | 2.2% |
| 7.2292 | 9 | 2.0% |
| 0 | 9 | 2.0% |
| Other values (169) | 306 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 2 | 0.4% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 87 | 88 |
| Distinct (%) | 82.9% | 84.6% |
| Missing | 341 | 342 |
| Missing (%) | 76.5% | 76.7% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 11 |
| Median length | 3 | 3 |
| Mean length | 3.7619048 | 3.5384615 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 395 | 368 |
| Distinct characters | 19 | 18 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 71 | 72 ? |
| Unique (%) | 67.6% | 69.2% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | E67 | B18 |
| 2nd row | E40 | C22 C26 |
| 3rd row | C92 | C78 |
| 4th row | F G63 | C83 |
| 5th row | D36 | B77 |
| Value | Count | Frequency (%) |
| c23 | 4 | 3.2% |
| c27 | 4 | 3.2% |
| c25 | 4 | 3.2% |
| c78 | 2 | 1.6% |
| c92 | 2 | 1.6% |
| c65 | 2 | 1.6% |
| b20 | 2 | 1.6% |
| e33 | 2 | 1.6% |
| c26 | 2 | 1.6% |
| c22 | 2 | 1.6% |
| Other values (87) | 99 |
| Value | Count | Frequency (%) |
| b28 | 2 | 1.7% |
| e8 | 2 | 1.7% |
| b96 | 2 | 1.7% |
| f33 | 2 | 1.7% |
| c83 | 2 | 1.7% |
| c78 | 2 | 1.7% |
| c26 | 2 | 1.7% |
| c22 | 2 | 1.7% |
| c68 | 2 | 1.7% |
| c65 | 2 | 1.7% |
| Other values (87) | 97 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 50 | |
| 2 | 47 | |
| 1 | 38 | |
| 3 | 32 | 8.1% |
| 6 | 28 | 7.1% |
| B | 26 | 6.6% |
| 5 | 22 | 5.6% |
| 20 | 5.1% | |
| 0 | 19 | 4.8% |
| D | 19 | 4.8% |
| Other values (9) | 94 |
| Value | Count | Frequency (%) |
| C | 43 | |
| 1 | 37 | |
| 2 | 36 | |
| B | 28 | 7.6% |
| 3 | 26 | 7.1% |
| 6 | 26 | 7.1% |
| 5 | 25 | 6.8% |
| 8 | 25 | 6.8% |
| 4 | 18 | 4.9% |
| 0 | 17 | 4.6% |
| Other values (8) | 87 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 250 | |
| Uppercase Letter | 125 | |
| Space Separator | 20 | 5.1% |
| Value | Count | Frequency (%) |
| Decimal Number | 238 | |
| Uppercase Letter | 117 | |
| Space Separator | 13 | 3.5% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 50 | |
| B | 26 | |
| D | 19 | 15.2% |
| E | 15 | 12.0% |
| A | 7 | 5.6% |
| F | 5 | 4.0% |
| G | 2 | 1.6% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| C | 43 | |
| B | 28 | |
| E | 17 | 14.5% |
| D | 16 | 13.7% |
| A | 9 | 7.7% |
| F | 3 | 2.6% |
| G | 1 | 0.9% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 47 | |
| 1 | 38 | |
| 3 | 32 | |
| 6 | 28 | |
| 5 | 22 | |
| 0 | 19 | |
| 8 | 19 | |
| 7 | 17 | 6.8% |
| 4 | 17 | 6.8% |
| 9 | 11 | 4.4% |
| Value | Count | Frequency (%) |
| 1 | 37 | |
| 2 | 36 | |
| 3 | 26 | |
| 6 | 26 | |
| 5 | 25 | |
| 8 | 25 | |
| 4 | 18 | |
| 0 | 17 | |
| 7 | 16 | |
| 9 | 12 | 5.0% |
Space Separator
| Value | Count | Frequency (%) |
| 20 |
| Value | Count | Frequency (%) |
| 13 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 270 | |
| Latin | 125 |
| Value | Count | Frequency (%) |
| Common | 251 | |
| Latin | 117 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 50 | |
| B | 26 | |
| D | 19 | 15.2% |
| E | 15 | 12.0% |
| A | 7 | 5.6% |
| F | 5 | 4.0% |
| G | 2 | 1.6% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| C | 43 | |
| B | 28 | |
| E | 17 | 14.5% |
| D | 16 | 13.7% |
| A | 9 | 7.7% |
| F | 3 | 2.6% |
| G | 1 | 0.9% |
Common
| Value | Count | Frequency (%) |
| 2 | 47 | |
| 1 | 38 | |
| 3 | 32 | |
| 6 | 28 | |
| 5 | 22 | |
| 20 | ||
| 0 | 19 | |
| 8 | 19 | |
| 7 | 17 | 6.3% |
| 4 | 17 | 6.3% |
| Value | Count | Frequency (%) |
| 1 | 37 | |
| 2 | 36 | |
| 3 | 26 | |
| 6 | 26 | |
| 5 | 25 | |
| 8 | 25 | |
| 4 | 18 | |
| 0 | 17 | |
| 7 | 16 | |
| 13 | 5.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 395 |
| Value | Count | Frequency (%) |
| ASCII | 368 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 50 | |
| 2 | 47 | |
| 1 | 38 | |
| 3 | 32 | 8.1% |
| 6 | 28 | 7.1% |
| B | 26 | 6.6% |
| 5 | 22 | 5.6% |
| 20 | 5.1% | |
| 0 | 19 | 4.8% |
| D | 19 | 4.8% |
| Other values (9) | 94 |
| Value | Count | Frequency (%) |
| C | 43 | |
| 1 | 37 | |
| 2 | 36 | |
| B | 28 | 7.6% |
| 3 | 26 | 7.1% |
| 6 | 26 | 7.1% |
| 5 | 25 | 6.8% |
| 8 | 25 | 6.8% |
| 4 | 18 | 4.9% |
| 0 | 17 | 4.6% |
| Other values (8) | 87 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 2 |
| Missing (%) | 0.2% | 0.4% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 445 | 444 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | S |
| 3rd row | C | S |
| 4th row | S | S |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 86 | 19.3% |
| Q | 37 | 8.3% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 81 | 18.2% |
| Q | 39 | 8.7% |
| (Missing) | 2 | 0.4% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 322 | |
| c | 86 | 19.3% |
| q | 37 | 8.3% |
| Value | Count | Frequency (%) |
| s | 324 | |
| c | 81 | 18.2% |
| q | 39 | 8.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 86 | 19.3% |
| Q | 37 | 8.3% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 81 | 18.2% |
| Q | 39 | 8.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 444 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 86 | 19.3% |
| Q | 37 | 8.3% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 81 | 18.2% |
| Q | 39 | 8.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 445 |
| Value | Count | Frequency (%) |
| Latin | 444 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 86 | 19.3% |
| Q | 37 | 8.3% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 81 | 18.2% |
| Q | 39 | 8.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 445 |
| Value | Count | Frequency (%) |
| ASCII | 444 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 322 | |
| C | 86 | 19.3% |
| Q | 37 | 8.3% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 81 | 18.2% |
| Q | 39 | 8.8% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.151 | -0.268 | 0.044 | 0.287 | 0.000 | -0.200 | 0.125 |
| Embarked | 0.000 | 1.000 | -0.082 | 0.054 | -0.003 | 0.262 | 0.020 | 0.030 | 0.172 |
| Fare | 0.151 | -0.082 | 1.000 | 0.422 | -0.031 | 0.477 | 0.182 | 0.498 | 0.307 |
| Parch | -0.268 | 0.054 | 0.422 | 1.000 | -0.005 | 0.041 | 0.311 | 0.484 | 0.154 |
| PassengerId | 0.044 | -0.003 | -0.031 | -0.005 | 1.000 | 0.072 | 0.000 | -0.140 | 0.052 |
| Pclass | 0.287 | 0.262 | 0.477 | 0.041 | 0.072 | 1.000 | 0.144 | -0.086 | 0.347 |
| Sex | 0.000 | 0.020 | 0.182 | 0.311 | 0.000 | 0.144 | 1.000 | -0.223 | 0.555 |
| SibSp | -0.200 | 0.030 | 0.498 | 0.484 | -0.140 | -0.086 | -0.223 | 1.000 | 0.148 |
| Survived | 0.125 | 0.172 | 0.307 | 0.154 | 0.052 | 0.347 | 0.555 | 0.148 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.101 | 0.193 | -0.123 | 0.069 | 0.268 | 0.030 | -0.122 | 0.100 |
| Embarked | 0.101 | 1.000 | -0.067 | 0.042 | 0.017 | 0.206 | 0.081 | -0.002 | 0.132 |
| Fare | 0.193 | -0.067 | 1.000 | 0.378 | -0.040 | 0.494 | 0.190 | 0.460 | 0.298 |
| Parch | -0.123 | 0.042 | 0.378 | 1.000 | -0.100 | 0.020 | 0.305 | 0.432 | 0.132 |
| PassengerId | 0.069 | 0.017 | -0.040 | -0.100 | 1.000 | 0.000 | 0.055 | -0.085 | 0.126 |
| Pclass | 0.268 | 0.206 | 0.494 | 0.020 | 0.000 | 1.000 | 0.163 | -0.083 | 0.310 |
| Sex | 0.030 | 0.081 | 0.190 | 0.305 | 0.055 | 0.163 | 1.000 | -0.202 | 0.574 |
| SibSp | -0.122 | -0.002 | 0.460 | 0.432 | -0.085 | -0.083 | -0.202 | 1.000 | 0.198 |
| Survived | 0.100 | 0.132 | 0.298 | 0.132 | 0.126 | 0.310 | 0.574 | 0.198 | 1.000 |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 190 | 191 | 1 | 2 | Pinsky, Mrs. (Rosa) | female | 32.0 | 0 | 0 | 234604 | 13.0000 | NaN | S |
| 721 | 722 | 0 | 3 | Jensen, Mr. Svend Lauritz | male | 17.0 | 1 | 0 | 350048 | 7.0542 | NaN | S |
| 448 | 449 | 1 | 3 | Baclini, Miss. Marie Catherine | female | 5.0 | 2 | 1 | 2666 | 19.2583 | NaN | C |
| 695 | 696 | 0 | 2 | Chapman, Mr. Charles Henry | male | 52.0 | 0 | 0 | 248731 | 13.5000 | NaN | S |
| 287 | 288 | 0 | 3 | Naidenoff, Mr. Penko | male | 22.0 | 0 | 0 | 349206 | 7.8958 | NaN | S |
| 727 | 728 | 1 | 3 | Mannion, Miss. Margareth | female | NaN | 0 | 0 | 36866 | 7.7375 | NaN | Q |
| 317 | 318 | 0 | 2 | Moraweck, Dr. Ernest | male | 54.0 | 0 | 0 | 29011 | 14.0000 | NaN | S |
| 414 | 415 | 1 | 3 | Sundman, Mr. Johan Julian | male | 44.0 | 0 | 0 | STON/O 2. 3101269 | 7.9250 | NaN | S |
| 558 | 559 | 1 | 1 | Taussig, Mrs. Emil (Tillie Mandelbaum) | female | 39.0 | 1 | 1 | 110413 | 79.6500 | E67 | S |
| 706 | 707 | 1 | 2 | Kelly, Mrs. Florence "Fannie" | female | 45.0 | 0 | 0 | 223596 | 13.5000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 105 | 106 | 0 | 3 | Mionoff, Mr. Stoytcho | male | 28.0 | 0 | 0 | 349207 | 7.8958 | NaN | S |
| 507 | 508 | 1 | 1 | Bradley, Mr. George ("George Arthur Brayton") | male | NaN | 0 | 0 | 111427 | 26.5500 | NaN | S |
| 864 | 865 | 0 | 2 | Gill, Mr. John William | male | 24.0 | 0 | 0 | 233866 | 13.0000 | NaN | S |
| 286 | 287 | 1 | 3 | de Mulder, Mr. Theodore | male | 30.0 | 0 | 0 | 345774 | 9.5000 | NaN | S |
| 672 | 673 | 0 | 2 | Mitchell, Mr. Henry Michael | male | 70.0 | 0 | 0 | C.A. 24580 | 10.5000 | NaN | S |
| 153 | 154 | 0 | 3 | van Billiard, Mr. Austin Blyler | male | 40.5 | 0 | 2 | A/5. 851 | 14.5000 | NaN | S |
| 329 | 330 | 1 | 1 | Hippach, Miss. Jean Gertrude | female | 16.0 | 0 | 1 | 111361 | 57.9792 | B18 | C |
| 50 | 51 | 0 | 3 | Panula, Master. Juha Niilo | male | 7.0 | 4 | 1 | 3101295 | 39.6875 | NaN | S |
| 865 | 866 | 1 | 2 | Bystrom, Mrs. (Karolina) | female | 42.0 | 0 | 0 | 236852 | 13.0000 | NaN | S |
| 774 | 775 | 1 | 2 | Hocking, Mrs. Elizabeth (Eliza Needs) | female | 54.0 | 1 | 3 | 29105 | 23.0000 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 433 | 434 | 0 | 3 | Kallio, Mr. Nikolai Erland | male | 17.0 | 0 | 0 | STON/O 2. 3101274 | 7.1250 | NaN | S |
| 482 | 483 | 0 | 3 | Rouse, Mr. Richard Henry | male | 50.0 | 0 | 0 | A/5 3594 | 8.0500 | NaN | S |
| 804 | 805 | 1 | 3 | Hedman, Mr. Oskar Arvid | male | 27.0 | 0 | 0 | 347089 | 6.9750 | NaN | S |
| 591 | 592 | 1 | 1 | Stephenson, Mrs. Walter Bertram (Martha Eustis) | female | 52.0 | 1 | 0 | 36947 | 78.2667 | D20 | C |
| 388 | 389 | 0 | 3 | Sadlier, Mr. Matthew | male | NaN | 0 | 0 | 367655 | 7.7292 | NaN | Q |
| 382 | 383 | 0 | 3 | Tikkanen, Mr. Juho | male | 32.0 | 0 | 0 | STON/O 2. 3101293 | 7.9250 | NaN | S |
| 618 | 619 | 1 | 2 | Becker, Miss. Marion Louise | female | 4.0 | 2 | 1 | 230136 | 39.0000 | F4 | S |
| 74 | 75 | 1 | 3 | Bing, Mr. Lee | male | 32.0 | 0 | 0 | 1601 | 56.4958 | NaN | S |
| 521 | 522 | 0 | 3 | Vovk, Mr. Janko | male | 22.0 | 0 | 0 | 349252 | 7.8958 | NaN | S |
| 544 | 545 | 0 | 1 | Douglas, Mr. Walter Donald | male | 50.0 | 1 | 0 | PC 17761 | 106.4250 | C86 | C |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 847 | 848 | 0 | 3 | Markoff, Mr. Marin | male | 35.00 | 0 | 0 | 349213 | 7.8958 | NaN | C |
| 163 | 164 | 0 | 3 | Calic, Mr. Jovo | male | 17.00 | 0 | 0 | 315093 | 8.6625 | NaN | S |
| 221 | 222 | 0 | 2 | Bracken, Mr. James H | male | 27.00 | 0 | 0 | 220367 | 13.0000 | NaN | S |
| 803 | 804 | 1 | 3 | Thomas, Master. Assad Alexander | male | 0.42 | 0 | 1 | 2625 | 8.5167 | NaN | C |
| 237 | 238 | 1 | 2 | Collyer, Miss. Marjorie "Lottie" | female | 8.00 | 0 | 2 | C.A. 31921 | 26.2500 | NaN | S |
| 164 | 165 | 0 | 3 | Panula, Master. Eino Viljami | male | 1.00 | 4 | 1 | 3101295 | 39.6875 | NaN | S |
| 799 | 800 | 0 | 3 | Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert) | female | 30.00 | 1 | 1 | 345773 | 24.1500 | NaN | S |
| 187 | 188 | 1 | 1 | Romaine, Mr. Charles Hallace ("Mr C Rolmane") | male | 45.00 | 0 | 0 | 111428 | 26.5500 | NaN | S |
| 99 | 100 | 0 | 2 | Kantor, Mr. Sinai | male | 34.00 | 1 | 0 | 244367 | 26.0000 | NaN | S |
| 682 | 683 | 0 | 3 | Olsvigen, Mr. Thor Anderson | male | 20.00 | 0 | 0 | 6563 | 9.2250 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||